10 research outputs found

    Integrating adaptive on-chip storage structures for reduced dynamic power

    Get PDF
    Journal ArticleEnergy efficiency in microarchitectures has become a necessity. Significant dynamic energy savings can be realized for adaptive storage structures such as caches, issue queues, and register files by disabling unnecessary storage resources. Prior studies have analyzed individual structures and their control. A common theme to these studies is exploration of the configuration space and use of system IPC as feedback to guide reconfiguration. However, when multiple structures adapt in concert, the number of possible configurations increases dramatically, and assigning causal effects to IPC change becomes problematic. To overcome this issue, we introduce designs that are reconfigured solely on local behavior. We introduce a novel cache design that permits direct calculation of efficient configurations. For buffer and queue structures, limited histogramming permits precise resizing control. When applying these techniques we show energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures with an average of 2.1% performance degradation

    Dynamically Reducing Pressure on the Physical Register File through Simple Register Sharing (Best Paper Award)

    Get PDF
    Using register renaming and physical registers, modern microprocessors eliminate false data dependences from reuse of the instruction set defined registers (logical registers). High performance processors that have longer pipelines and a greater capacity to exploit instruction-level parallelism have more instructions in-flight and require more physical registers. Simultaneous multithreading architectures further exacerbate this register pressure. This paper evaluates two register sharing techniques for reducing register usage. The first technique dynamically combines physical registers having the same value. The second technique combines the demand of several instructions updating the same logical register and share physical register storage among them. While similar techniques have been proposed previously, an important contribution of this paper is to exploit only special cases that provide most of the benefits of more general solutions but at a very low hardware complexity. Despite the simplicity, our design reduces the required number of physical registers by more than 10% on some applications, and provides almost half of the total benefits of an aggressive (complex) scheme. More importantly, we show the simpler design to reduce register pressure has significant performance effects in a simultaneous multithreaded (SMT) architecture where register availability can be a bottleneck. Our results show an average of 25.7% performance improvement for an SMT architecture with 160 registers or, equivalently, similar performance as an SMT with 200 registers (25% more) but no register sharing

    Real-Time Penalties in RISC Processing

    No full text
    The RISC processor features that provide high performance are probabilistic (e.g., cache, TLB, writebuffers, branch prediction, etc.), so worst-case analysis in real-time systems must regularly assume the pathological conditions that make these features perform poorly (e.g., every cache access conflicts). This report presents analytical results of performance penalties due to worst-case execution time (WCET) estimates for RISC processors in real-time systems. The results clearly indicate where efforts should be made to reduce variability in processor designs. 1 Introduction In real-time computing the correctness of an answer depends not only on its logical value but also when it is produced. The need to guarantee timing behavior of applications often requires worst-case assumptions be made about runtime behavior such as assuming every cache access is a miss or all branches are incorrectly predicted. While such features as caching and branch prediction significantly enhance performance..

    Comparing Caching Techniques for Multitasking Real-Time Systems

    No full text
    Correctness in real-time computing depends on the logical result and the time when it is available. Real-time operating systems need to know the timing behavior of applications to ensure correct real-time system behavior. Thus, predictability in the underlying hardware operation is required. Unfortunately, standard, embedded cache management policies in microprocessors are designed for excellent probabilistic behavior but lack predictability, especially in a multitasking environment. In this article we examine the two popular cache management policies that support predictable cache behavior in a multitasking environment and quantitatively compare them. Using a novel application of an existing analytical cache model we show that neither policy is best in general and delimit the system characteristics where each is most effective. 1 Introduction In real-time computing, correct operation depends on both the logical result and when it is available. Real-time systems have the characteristic..

    Dynamically Reducing Pressure on the Physical Register File through Simple Register Sharing

    No full text
    Using register renaming and physical registers, modern microprocessors eliminate false data dependences from reuse of the instruction set defined registers (logical registers). High performance processors that have longer pipelines and a greater capacity to exploit instruction-level parallelism have more instructions in-flight and require more physical registers. Simultaneous multithreading architectures further exacerbate this register pressure. This paper evaluates two register sharing techniques for reducing register usage. The first technique dynamically combines physical registers having the same value. The second technique combines the demand of several instructions updating the same logical register and share physical register storage among them. While similar techniques have been proposed previously, an important contribution of this paper is to exploit only special cases that provide most of the benefits of more general solutions but at a very low hardware complexity. Despite the simplicity, our design reduces the required number of physical registers by more than 10 % on some applications, and provides almost half of the total benefits of an aggressive (complex) scheme. More importantly, we show the simpler design to reduce register pressure has significant performance effects in a simultaneous multithreaded (SMT) architecture where register availability can be a bottleneck. Our results show an average of 25.7 % performance improvement for an SMT architecture with 160 registers or, equivalently, similar performance as an SMT with 200 registers (25 % more) but no register sharing

    Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

    No full text
    Energy efficiency in microarchitectures has become a necessity. Significant dynamic energy savings can be realized for adaptive storage structures such as caches, issue queues, and register files by disabling unnecessary storage resources. Prior studies have analyzed individual structures and their control. A common theme to these studies is exploration of the configuration space and use of system IPC as feedback to guide reconfiguration. However, when multiple structures adapt in concert, the number of possible configurations increases dramatically, and assigning causal effects to IPC change becomes problematic. To overcome this issue, we introduce designs that are reconfigured solely on local behavior. We introduce a novel cache design that permits direct calculation of efficient configurations. For buffer and queue structures, limited histogramming permits precise resizing control. When applying these techniques we show energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures with an average of 2.1% performance degradation
    corecore